Stylistic text classification using functional lexical features
نویسندگان
چکیده
Most text analysis and retrieval work to date has focused on determining the topic of a text, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This paper addresses the problem of classifying texts by style (along several different dimensions), developing a new type of lexical feature based on taxonomies of various semantic functions of different lexical items (words or phrases). We show the usefulness of such features for text classification by author, author personality, gender of literary characters, sentiment (positive/negative feeling), and scientific rhetorical styles. We further show how the use of such functional features aids in gaining insight about stylistic differences between texts. ∗Casey Whitelaw was a visiting scholar at the IIT Linguistic Cognition Laboratory during November 2004.
منابع مشابه
Style of Religious Texts in 20th Century
In this study, we present the results of the investigation of diachronic stylistic changes in 20th century religious texts in two major English language varieties – British and American. We examined a total of 146 stylistic features, divided into three main feature sets: (average sentence length, Automated readability index, lexical density and lexical richness), part-of-speech frequencies and ...
متن کاملIdentifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets
Text classification is an important and challenging research domain. In this paper, identifying historical period and ethnic origin of documents using stylistic feature sets is investigated. The application domain is Jewish Law articles written in Hebrew-Aramaic. Such documents present various interesting problems for stylistic classification. Firstly, these documents include words from both la...
متن کاملStylistic Changes for Temporal Text Classification
This paper investigates stylistic changes in a set of Portuguese historical texts ranging from the 17 to the early 20 century and presents a supervised method to classify them per century. Four stylistic features – average sentence length (ASL), average word length (AWL), lexical density (LD), and lexical richness (LR) – were automatically extracted for each sub-corpus. The initial analysis of ...
متن کاملA Stylistic Analysis of Lexicon in Ray Bradbury’s The Martian Chronicles
Ray Bradbury’s The Martian Chronicles is a futuristic, science fiction novel that chronicles the colonization of Mars by humans, projecting the United States’ colonial and immigrant past on to a symbolic future. Bradbury’s use of language is mostly picturesque and sensory. The present paper applies a text-oriented analysis of stylistic elements that construct meaning in the text and evoke the n...
متن کاملStyle Breach Detection: An Unsupervised Detection Model
This paper deals with the sub-task of PAN 2017 Author Identification, which is to detect style breaches for unknown number of authors within a single document in English. The presented model is an unsupervised approach that will detect style breaches and mark text boundaries on the basis of different stylistic features. This model will use some classical stylistic features like POS analysis and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 58 شماره
صفحات -
تاریخ انتشار 2007